Covariance estimation methods

These differential equations depend on the entire probability density function / (x, t) for x(t). The evolution with time of the probability density function can, in principle, be solved with Kolmogorov s forward equation (Jazwinski, 1970), although this equation has been solved only in a few simple cases (Bancha-Reid, 1960). The implementation of practical algorithms for the computation of the estimate and its error covariance requires methods that do not depend on knowing p(x, t). [Pg.158]

Only a few publications in the literature have dealt with this problem. Almasy and Mah (1984) presented a method for estimating the covariance matrix of measured errors by using the constraint residuals calculated from available process data. Darouach et al. (1989) and Keller et al. (1992) have extended this approach to deal with correlated measurements. Chen et al. (1997) extended the procedure further, developing a robust strategy for covariance estimation, which is insensitive to the presence of outliers in the data set. [Pg.203]

The covariance matrix of measurement errors is a very useful statistical property. Indirect methods can deal with unsteady sampling data, but unfortunately they are very sensitive to outliers and the presence of one or two outliers can cause misleading results. This drawback can be eliminated by using robust approaches via M-estimators. The performance of the robust covariance estimator is better than that of the indirect methods when outliers are present in the data set. [Pg.214]

The Mahalanobis distance used for multivariate outlier detection relies on the estimation of a covariance matrix (see Section 2.3.2), in this case preferably a robust covariance matrix. However, robust covariance estimators like the MCD estimator need more objects than variables, and thus for many applications with m>n this approach is not possible. For this situation, other multivariate outlier detection techniques can be used like a method based on robustified principal components (Filzmoser et al. 2008). The R code to apply this method on a data set X is as follows ... [Pg.64]

The goal of robust PCA methods is to obtain principal components that are not influenced much by outliers. A first group of methods is obtained by replacing the classical covariance matrix with a robust covariance estimator, such as the reweighed MCD estimator [45] (Section 6.3.2). Let us reconsider the Hawkins-Bradu-Kass data in p = 4 dimensions. Robust PCA using the reweighed MCD estimator yields the score plot in Figure 6.7b. We now see that the center is correctly estimated in the... [Pg.187]

Another approach to robust PCA has been proposed by Hubert et al. [52] and is called ROBPCA. This method combines ideas of both projection pursuit and robust covariance estimation. The projection pursuit part is used for the initial dimension reduction. Some ideas based on the MCD estimator are then applied to this lowerdimensional data space. Simulations have shown that this combined approach yields more accurate estimates than the raw projection pursuit algorithm RAPCA. The complete description of the ROBPCA method is quite involved, so here we will only outline the main stages of the algorithm. [Pg.189]

Inclusion of the posthoc option instructs NONMEM to obtain the Bayesian post hoc ETA estimates when the first-order method is used. These effects and other relevant parameters can be output into a table using the table record. Thereafter, the distribution of the effects can be characterized, including skewness if present. Both the mixture model and the nonmixture models need to be reestimated with the first-order method, as one cannot compare the mofs in a meaningful way between models differing only in estimation method. The mof has dropped 676 points between the nonmixture model (see r5.txt) and the mixture model (r4.txt). Furthermore, the mixture model run has now concluded with a successful covariance step. A choice has to made whether to make two plots (one for each subpopulation) or one (after all, the etas all share the same distribution). The latter approach is shown in Figure 28.2. Similar plots can be generated for each subpopulation. [Pg.730]

Covariate screening methods are used when there are a large number of covariates, such that evaluating every possible combination in a model is prohibitive. With this methodology, EBEs of the random effects are treated as data and then exploratory methods are used to assess the relationship between the random effects and the covariate of interest. In other words, each individual s pharmacokinetic parameter, clearance for example, is estimated and treated as a being without measurement error. These Bayes estimates are then compared against subject-specific covariates for a relationship using either manual or automated methods. [Pg.235]

Wu and Wu (2002) compared three different covariate screening methods nonlinear least-squares based method (NL-based), EBE-based method, and direct covariate screening by inclusion in the model and LRT. In the NL-based method, the same model is fit to each individual using nonlinear regression and the parameter estimates for that subject are obtained. Correlation tests or regression-based models between the individual parameter estimates and individual covariates may then be used to determine whether a significant relationship exists between the variables. This method is difficult to implement in practice because it requires rich data for each subject. For Phase 3 studies where sparse pharmacokinetic data are often collected, this method is impractical since many subjects will have insufficient data to support even simple pharmacokinetic... [Pg.239]

In summary, the Type I error rate from using the LRT to test for the inclusion of a covariate in a model was inflated when the data were heteroscedastic and an inappropriate estimation method was used. Type I error rates with FOCE-I were in general near nominal values under most conditions studied and suggest that in most cases FOCE-I should be the estimation method of choice. In contrast, Type I error rates with FO-approximation and FOCE were very dependent on and sensitive to many factors, including number of samples per subject, number of subjects, and how the residual error was defined. The combination of high residual variability with sparse sampling was a particularly disastrous combination using... [Pg.271]

Performs nonlinear regression using the Gauss-Newton estimation method. The jc-data is given as x, while the y-data is given as y. The function, FUN, that is to be fitted must be written as an m-file. It will take three arguments the coefficient values, x, and y (in this order). The function should be written to allow for matrix evaluatitni. The initial guess is specified in bataO. The vector beta contains the estimated values of the coefficients, the vector r contains the residuals, and covb is the estimated covariance matrix for the problem. J is the Jacobian matrix evaluated with the best estimate for the parameters. [Pg.343]

The primary purpose for expressing experimental data through model equations is to obtain a representation that can be used confidently for systematic interpolations and extrapolations, especially to multicomponent systems. The confidence placed in the calculations depends on the confidence placed in the data and in the model. Therefore, the method of parameter estimation should also provide measures of reliability for the calculated results. This reliability depends on the uncertainties in the parameters, which, with the statistical method of data reduction used here, are estimated from the parameter variance-covariance matrix. This matrix is obtained as a last step in the iterative calculation of the parameters. [Pg.102]

Finally, an important advantage of the Gauss-Newton method is also that at the end of the estimation besides the best parameter estimates their covariance matrix is also readily available without any additional computations. Details will be given in Chapter 11. [Pg.55]

When the Gauss-Newton method is used to estimate the unknown parameters, we linearize the model equations and at each iteration we solve the corresponding linear least squares problem. As a result, the estimated parameter values have linear least squares properties. Namely, the parameter estimates are normally distributed, unbiased (i.e., (k )=k) and their covariance matrix is given by... [Pg.177]

As already discussed in Chapter 11, matrix A calculated during each iteration of the Gauss-Newton method can be used to determine the covariance matrix of the estimated parameters, which in turn provides a measure of the accuracy of the parameter estimates (Tan and Kalogerakis, 1992). [Pg.376]

Let us review what we did with the depression example so far. First, we conjectured a taxon and three indicators. Next, we selected one of these indicators (anhedonia) as the input variable and two other indicators (sadness and suicidality) as the output variables. Input and output are labels that refer to a role of the indicator in a given subanalysis. We cut the input indicator into intervals, hence the word Cut in the name of the method (Coherent Cut Kinetics), and we looked at the relationship between the output indicators. Specifically, we calculated covariances of the output indicators in each interval, hence the word Kinetics —we moved calculations from interval to interval. Suppose that after all that was completed, we find a clear peak in the covariance of sadness and suicidality, which allows us to estimate the position of the hitmax and the taxon base rate. What next Now we need to get multiple estimates of these parameters. To achieve this, we change the... [Pg.42]

Equations (8.11) and (8.12) are approximate expressions for propagating the estimate and the error covariance, and in the literature they are referred to as the extended Kalman filter (EKF) propagation equations (Jaswinski, 1970). Other methods for dealing with the same problem are discussed in Gelb (1974) and Anderson and Moore (1979). [Pg.158]

As pointed out by Keller et al. (1992), if the process is truly at steady state, then estimation by the so-called direct method using the sample variance and covariance is adequate and simple to use. Let y(- be the ith element in a vector of measured variables, then the sample variance of the r repeated measurements of y, is given by... [Pg.203]

This procedure (based on sample variance and covariance) is referred to as the direct method of estimation of the covariance matrix of the measurement errors. As it stands, it makes no use of the inherent information content of the constraint equations, which has proved to be very useful in process data reconciliation. One shortcoming of this approach is that these r samples should be under steady-state operation, in order to meet the independent sampling condition otherwise, the direct method could give incorrect estimates. [Pg.203]

The indirect method uses Eq. (10.9) to estimate F. This procedure requires the value of the covariance matrix, , which can be calculated from the residuals using the balance equations and the measurements. [Pg.204]

As was shown, the conventional method for data reconciliation is that of weighted least squares, in which the adjustments to the data are weighted by the inverse of the measurement noise covariance matrix so that the model constraints are satisfied. The main assumption of the conventional approach is that the errors follow a normal Gaussian distribution. When this assumption is satisfied, conventional approaches provide unbiased estimates of the plant states. The presence of gross errors violates the assumptions in the conventional approach and makes the results invalid. [Pg.218]

Comment. Logistic tumor prevalence method is unbiased. Requires maximum likelihood estimation. Allows for covariates and stratifying variables. It may be time-consuming and have convergence problem with sparse tables (low tumor incidences) and clustering of tumors. [Pg.324]

In Sections 1.6.3 and 1.6.4, different possibilities were mentioned for estimating the central value and the spread, respectively, of the underlying data distribution. Also in the context of covariance and correlation, we assume an underlying distribution, but now this distribution is no longer univariate but multivariate, for instance a multivariate normal distribution. The covariance matrix X mentioned above expresses the covariance structure of the underlying—unknown—distribution. Now, we can measure n observations (objects) on all m variables, and we assume that these are random samples from the underlying population. The observations are represented as rows in the data matrix X(n x m) with n objects and m variables. The task is then to estimate the covariance matrix from the observed data X. Naturally, there exist several possibilities for estimating X (Table 2.2). The choice should depend on the distribution and quality of the data at hand. If the data follow a multivariate normal distribution, the classical covariance measure (which is the basis for the Pearson correlation) is the best choice. If the data distribution is skewed, one could either transform them to more symmetry and apply the classical methods, or alternatively... [Pg.54]

The distance between object points is considered as an inverse similarity of the objects. This similarity depends on the variables used and on the distance measure applied. The distances between the objects can be collected in a distance matrk. Most used is the euclidean distance, which is the commonly used distance, extended to more than two or three dimensions. Other distance measures (city block distance, correlation coefficient) can be applied of special importance is the mahalanobis distance which considers the spatial distribution of the object points (the correlation between the variables). Based on the Mahalanobis distance, multivariate outliers can be identified. The Mahalanobis distance is based on the covariance matrix of X this matrix plays a central role in multivariate data analysis and should be estimated by appropriate methods—mostly robust methods are adequate. [Pg.71]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...