Influential Observations

Figure 36.5 shows the scores plots for PC2 v. PCI (A) and PC4 v. PC3 (B). Such plots are useful in indicating a possible clustering of samples in subsets or the presence of influential observations. Again, a spectrum with a spike may show up as an outlier for that sample in one of the scores plots. If outliers are indicated, one should try and identify the cause of the outlying behaviour. Only when a satisfactory explanation is found can the outlier be safely omitted. In practice, one will... [Pg.361]

At each stage of the refinement of a new set of parameters, the hat matrix diagonal elements were calculated in order to detect the influential observations following the criterium of Velleman and Welsh [8,9]. The inspection of the residues of such reflections revealed those which are aberrant but progressively, these aberrations disappeared when the pseudo-atoms model was used (introduction of multipoler coefficients). This fact confirms that the determination of the phases in acentric structures is improved by sophisticated models like the multipole density model. [Pg.301]

Cook RD (1977) Detecting influential observations in linear regression. Technometrics 19 15-18... [Pg.93]

Chatterjee S, Hadi AS (1986) Influential observations, high leverage points, and outliers in linear regression. Stat Sci 1(3) 379—416... [Pg.93]

Scarponi G, Moret I, Capodaglio G, Romanazzi M, Cross-validation, influential observations and selection of variables in chemometric studies of wines by principal component analysis, Journal of Chemometrics, 1990,4,217-240. [Pg.365]

Figure 2.4 Example of influential and non-influential observations. Top plot Y-value is discordant from bulk of data but does not influence the estimate of the regression line. Middle plot x-value is discordant from bulk of data but does not influence the estimate of the regression line. Bottom plot x-value and Y-value are discordant from bulk of data and have a profound influence on the estimate of the regression line. Not all outlier observations are influential and not all influential observations are outliers.

Influential observations are ones that significantly affect the values of the parameter estimates, their standard errors, and the predicted values. One statistic used to detect influential observations has already been presented, the HAT matrix. An obvious way to detect these observations is to remove an observation one at a time and examine how the recalculated parameter estimates compare to their original values. This is the row deletion approach to influence diagnostics and on first glance it would appear that this process requires n-iter-ations—a numerically intensive procedure. Statisticians, however, have derived equations that directly reflect the influence of the ith observation without iteration. One useful diagnostic is DFFITS... [Pg.72]

A large change in DFBETAS is indicative that the ith observation has a significant impact on the value of a regression coefficient. As a yardstick for small to moderate sample sizes, DFFITS and DFBETAS greater than 1 are indicative of influential observations. For larger sample sizes a smaller absolute value may be needed as a yardstick one rule of thumb is 2n-0 5 for DFBETAS, and 2 p/n for DFFITS (Belsley, Kuh, and Welsch, 1980). [Pg.72]

Given the level of research activity devoted to identification of influential observations, considerably less effort has been devoted to what to do about them. Under guidelines (E9 Statistical Principles for Clinical Trials) developed by the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (1997), more commonly called ICH, several principles for dealing with outliers or influential observations are presented. First, data analysis should be defined prior to analyzing the data, preferable before data collection even begins. The data analysis plan should specify in detail how outliers or influential observations will be handled. Second, in the absence of a plan for handling outliers or influential observations, the analyst should do two analyses, one with and the other without the points in question, and the differences between the results should be presented in the discussion of the results. Lastly, identification of outliers should be based on statistical, as well as scientific rationale, and that the context of the data point should dictate how to deal with it. [Pg.73]

An area related to model validation is influence analysis, which deals with how stable the model parameters are to influential observations (either individual concentration values or individual subjects), and model robustness, which deals with how stable the model parameters are to perturbations in the input data. Influence analysis has been dealt with in previous chapters. The basic idea is to generate a series of new data sets, where each new data set consists of the original data set with one unique subject removed or has a different block of data removed, just like how jackknife data sets are generated. The model is refit to each of the new data sets and how the parameter estimates change with each new data set is determined. Ideally, no subject should show... [Pg.256]

Figure 7.19 Scatter plot of principal component analysis of data in Figure 7.18. The first three principal components are plotted. Subjects that are removed from the bulk of the data represent influential observations that may be influencing the parameter estimates overall. Potentially influential subjects are noted.

Parameter All data Outliers removed No outliers and no influential observations... [Pg.328]

Once a suitable covariate model is identified and no further model development will be done, the next step is to examine the dataset for outliers and influential observations. It may be that a few subjects are driving the inclusion of a covariate in a model or that a few observations are biasing the parameter estimates. Examination of the weighted residuals under Eq. (9.14) with the model estimates given in Table 9.15 showed that the distribution was skewed with two observations outside the acceptable limits of + 5. Patient 54 had an observable concentration of 4.05 mg/L 6-h postdose but had a predicted concentration of 1.22 mg/L, a difference of 2.83 mg/L and a corresponding weighted residual of +5.4. Patient 84 had an observable concentration of 1.57 mg/L 7.5-h postdose but had a... [Pg.328]

Figure 9.15 Scatter plots of the first three principal components of all structural parameters and variance components. Each index patient was singularly removed from the data set and the model in Eq. (9.14) was refit using FOCE-I. The resulting matrix of structural parameters and variance components was then analyzed using principal components analysis. Influential observations are noted in the figures. Patient 100, who had a BSA of 2.52m2 and a BMI of 31.2kg/m2, is denoted as a solid square.

The next step in the analysis is validating the model. As a first step, 1000 bootstrap data sets were created from the data set, excluding influential observations and patients. The best model as presented in Eq. (9.14) was then fit to each bootstrap distribution. Of the 1000... [Pg.331]

Atkinson, A.C. Two graphical displays for outlying and influential observations in regression. Biometrika 1981 68 13-20. [Pg.365]

Apart from the analysis of residuals, the recognition of outUers and of influential observations is important for the selection of a regression model. We will raise those questions for the generalized regression diagnostics in Section 6.2. [Pg.227]

Outliers should not be confused with influential observations. Until now, we have used the residuals in order to find problems with a model. If we want to study the robustness of a model to perturbations, we do an influence analysis. This kind of study is done as though the model were correct. Influential observations cannot be detected by large residuals. Their removal, however, may cause major changes in subsequent use of the model. The difference can be understood from Figure 6.7. A straight-Une model that includes the influential observation will give a different slope if that observation is deleted. On the other hand, if the obvious outlier is included in the model, we will estimate larger residuals for all of the cases. [Pg.250]

To measure the change of the influential observation, the model has to be built by including or deleting it. From the two models, we obtain different estimations for the y values that can be used to compute a measure, the so-called Cook s distance, D ... [Pg.250]

Figure 6.7 Straight modeling in the presence of an outlier and an influential observation.

A typical outlier is sample number 17. The sample has a very large residual, but cannot be identified as an influential observation. [Pg.253]

Figure 6.10 Regression diagnostics Residual plot in dependence on for influential observations and out- the calculated y values, (c) Jack-liers. (a) Cook s distance for recogni- knifed residuals according to tion of influential observations, (b) Eq. (6.102).

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...