Factor analysis statistical assumptions

Multiple linear regression is strictly a parametric supervised learning technique. A parametric technique is one which assumes that the variables conform to some distribution (often the Gaussian distribution) the properties of the distribution are assumed in the underlying statistical method. A non-parametric technique does not rely upon the assumption of any particular distribution. A supervised learning method is one which uses information about the dependent variable to derive the model. An unsupervised learning method does not. Thus cluster analysis, principal components analysis and factor analysis are all examples of unsupervised learning techniques. [Pg.719]

In contrast to PCA which can be considered as a method for basis rotation, factor analysis is based on a statistical model with certain model assumptions. Like PCA, factor analysis also results in dimension reduction, but while the PCs are just derived by optimizing a statistical criterion (spread, variance), the factors are aimed at having a real meaning and an interpretation. Only a very brief introduction is given here a classical book about factor analysis in chemistry is from Malinowski (2002) many other books on factor analysis are available (Basilevsky 1994 Harman 1976 Johnson and Wichem 2002). [Pg.96]

Odhiambo and Manene presented a performance analysis of stepwise screening that assumes a2 > 0, where statistical tests are fallible even if all assumptions are correct. They derived expected values of the number of runs required, the number of factors mistakenly classified as active, and the number of factors mistakenly classified as not active, in terms of p, f, k, and the significance level and power of the tests used. These expressions are fairly complicated and are not repeated here, but Odhiambo and Manene also provide simpler approximations that are appropriate for small values of p. [Pg.200]

Further statistical analyses can be used to determine the relative influence that any factor or set of factors has on the total variation (global uncertainty). One of these methods is the analysis of variance (ANOVA). This is an important technique for analyzing the effects of categorical factors on a response. However, the assumption of normality of the data has to be checked prior to the use of ANOVA to decompose the variability in the response variable between the different factors. Depending upon the type of analysis, it may be important to determine (a) which factors have a significant effect on the response, and/or (b) how much of the variability in the response variable is attributable to each factor (as described in the statistical software STATGRAPHICS, Vs 5.0). [Pg.309]

The statistical analysis approach is to calculate 95% confidence intervals for the proportion of participants in each group (placebo and combined active) reporting a headache. This analysis approach is reasonable because the sample size is sufficiently large (that is, the values, pn, in each group are at least five). Satisfying this assumption enables us to use the Z distribution for the reliability factor. [Pg.105]

In the interpretation of the numerical results that can be extracted from Mdssbauer spectroscopic data, it is necessary to recognize three sources of errors that can affect the accuracy of the data. These three contributions to the experimental error, which may not always be distinguishable from each other, can be identified as (a) statistical, (b) systematic, and (c) model-dependent errors. The statistical error, which arises from the fact that a finite number of observations are made in order to evaluate a given parameter, is the most readily estimated from the conditions of the experiment, provided that a Gaussian error distribution is assumed. Systematic errors are those that arise from factors influencing the absolute value of an experimental parameter but not necessarily the internal consistency of the data. Hence, such errors are the most difficult to diagnose and their evaluation commonly involves measurements by entirely independent experimental procedures. Finally, the model errors arise from the application of a theoretical model that may have only limited applicability in the interpretation of the experimental data. The errors introduced in this manner can often be estimated by a careful analysis of the fundamental assumptions incorporated in the theoretical treatment. [Pg.519]

To demonstrate that this was not the case, the shaded trays shown in Figure 4A were sampled and for each tray 3x2 vials were assayed for protein content. The protein content results were analyzed with a two-cell analysis-of-variance model including a factor, the left/right positioning, and a covariate, the shelf number. In order to increase the power of the statistical testing, the shelf number was handled as a covariate and not as a factor, based on the assumption that the filling was progressing at a constant rate. [Pg.580]

In a statistically designed corrosion experiment the goal is to infer a cause and effect relationship between a treatment variable and a dependent variable. By whatever means this is achieved, there is an implicit assumption that, except for the treatment itself, there is an equivalence between the sample of units that received a treatment and the sample (controls) that did not. In statistical analysis of results from experiments, all tests start with this assumption of equivalence between or among groups in all regards except for the factors being tested. Experimental designs are intended to assure, or at least approximate, conditions that make that assumption credible. [Pg.56]

If the assumption that only two parameters are important to aaivity is troublesome, an alternative approach would be the use of discriminant analysis. This technique uses any number of parameters to attempt to discriminate between active and inactive compounds or any other grouping of properties. The average values of a property for the active set and the inactive set are compared to the average value of the same property for both sets combined. If they are statistically different, then the factor involved can be used as a discriminant. A good example of this approach can be found in a report by Martin et al. Additional examples can be found in the work of Franke and Meisske, Henry and Block, and Ogino et al. ... [Pg.153]

The result summaries can be examined by main effects plots which show the mean value of each output at each level of each factor. When there are replications, as here, it is also possible to investigate the effect of the parameters on the variability of the responses. The standard deviation of the responses for the three replications represents the variability of the output. The log transformation of the standard deviations is taken to make sure that the assumptions of approximate normality and constant variability are obeyed. Although these assumptions are not necessary for main effects plots, they are necessary for the statistical analysis described below. [Pg.317]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...