Distributions, selection multivariate normal distribution

Two datasets are fist simulated. The first contains only normal samples, whereas there are 3 outliers in the second dataset, which are shown in Plot A and B of Figure 2, respectively. For each dataset, a percentage (70%) of samples are randomly selected to build a linear regression model of which the slope and intercept is recorded. Repeating this procedure 1000 times, we obtain 1000 values for both the slope and intercept. For both datasets, the intercept is plotted against the slope as displayed in Plot C and D, respectively. It can be observed that the joint distribution of the intercept and slope for the normal dataset appears to be multivariate normally distributed. In contrast, this distribution for the dataset with outliers looks quite different, far from a normal distribution. Specifically, the distributions of slopes for both datasets are shown in Plot E and F. These results show that the existence of outliers can greatly influence a regression model, which is reflected by the odd distributions of both slopes and intercepts. In return, a distribution of a model parameter that is far from a normal one would, most likely, indicate some abnormality in the data. [Pg.5]

To assess the behaviour of biomarker selection for larger data sets, we resort to simulation. Simxilated data sets have been constructed as multivariate normal distributions, using the means and covariance matrices of the experimental data both classes (imtreated and spiked) have been simulated separately. Simulations are performed for both positive and negative modes in every simulation, one hundred data sets are created. The outcomes reported here are the averages of the results for the one himdred simulations. Data sets consisting of 10,25, 50 and 200 biological samples p>er class have been synthesized. [Pg.148]

When the EP comprises linear computations (linear in the observations) such as simple differences, y - B, or linear least squares or linear multivariate computations, initial normality (of the observations y) is preserved for the estimated quantities. Non-linear computations, such as arise commonly in iterative model selection and peak search routines, produce estimated parameters having non-normal distributions (59). Caution is in order, in those cases, in applying "normal" values of test statistics to calculate 1 and Cl s. (Other factors to consider are the extent of non-linearity, the level of confidence or significance [1-a], and the robustness of the statistic in question.)... [Pg.27]

In a real-world structural identification application, where no information is available regarding the true pdfs of the input random vector, someone could use maximum likelihood estimation fitting of the environmental condition data values to a parametric distribution. The results of such a fitting of the data onto pdfs are shown in Fig. 5. Based on this fitting and after transforming the pdf of the mass load into a normal distribution by using the natural logarithm, the Hermite polynomials may be selected for the construction of the multivariate PC basis functions. [Pg.3504]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...