Discriminant analysis distribution

We also make a distinction between parametric and non-parametric techniques. In the parametric techniques such as linear discriminant analysis, UNEQ and SIMCA, statistical parameters of the distribution of the objects are used in the derivation of the decision function (almost always a multivariate normal distribution... [Pg.212]

The Mahalanobis distance representation will help us to have a more general look at discriminant analysis. The multivariate normal distribution for w variables and class K can be described by... [Pg.221]

A number of chemometric tools have been employed for these classifications, including partial least squares - hierarchical cluster analysis (PLS-HCA) for Viagra tablets [98] and antimalarial artesunate tablets [99]. de Peinder et al. used partial least squares discriminant analysis (PLS-DA) models to distinguish genuine from counterfeit Lipitor tablets even when the real API was present [100]. The counterfeit samples also were found to have poorer API distribution than the genuine ones based on spectra collected in a cross pattern on the tablet. [Pg.217]

Miller, M.D. and Zhang, Y. (2006) A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution in human. Journal of Medicinal Chemistry,... [Pg.220]

Quadratic discriminant analysis (QDA) is a probabilistic parametric classification technique which represents an evolution of EDA for nonlinear class separations. Also QDA, like EDA, is based on the hypothesis that the probability density distributions are multivariate normal but, in this case, the dispersion is not the same for all of the categories. It follows that the categories differ for the position of their centroid and also for the variance-covariance matrix (different location and dispersion), as it is represented in Fig. 2.16A. Consequently, the ellipses of different categories differ not only for their position in the plane but also for eccentricity and axis orientation (Geisser, 1964). By coimecting the intersection points of each couple of corresponding ellipses (at the same Mahalanobis distance from the respective centroids), a parabolic delimiter is identified (see Fig. 2.16B). The name quadratic discriminant analysis is derived from this feature. [Pg.88]

Current methods for supervised pattern recognition are numerous. Typical linear methods are linear discriminant analysis (LDA) based on distance calculation, soft independent modeling of class analogy (SIMCA), which emphasizes similarities within a class, and PLS discriminant analysis (PLS-DA), which performs regression between spectra and class memberships. More advanced methods are based on nonlinear techniques, such as neural networks. Parametric versus nonparametric computations is a further distinction. In parametric techniques such as LDA, statistical parameters of normal sample distribution are used in the decision rules. Such restrictions do not influence nonparametric methods such as SIMCA, which perform more efficiently on NIR data collections. [Pg.398]

It is interesting to note that various QSAR/QSPR models from an array of methods can be very different in both complexity and predictivity. For example, a simple QSPR equation with three parameters can predict logP within one unit of measured values (43) while a complex hybrid mixture discriminant analysis-random forest model with 31 computed descriptors can only predict the volume of distribution of drugs in humans within about twofolds of experimental values (44). The volume of distribution is a more complex property than partition coefficient. The former is a physiological property and has a much higher uncertainty in its experimental measurements while logP is a much simpler physicochemical property and can be measured more accurately. These and other factors can dictate whether a good predictive model can be built. [Pg.41]

Discriminant analysis (DA) performs samples classification with an a priori hypothesis. This hypothesis is based on a previously determined TCA or other CA protocols. DA is also called "discriminant function analysis" and its natural extension is called MDA (multiple discriminant analysis), which sometimes is named "discriminant factor analysis" or CD A (canonical discriminant analysis). Among these type of analyses, linear discriminant analysis (LDA) has been largely used to enforce differences among samples classes. Another classification method is known as QDA (quadratic discriminant analysis) (Frank and Friedman, 1989) an extension of LDA and RDA (regularized discriminant analysis), which works better with various class distribution and in the case of high-dimensional data, being a compromise between LDA and QDA (Friedman, 1989). [Pg.94]

Two non-parametric methods for hypothesis testing with PCA and PLS are cross-validation and the jackknife estimate of variance. Both methods are described in some detail in the sections describing the PCA and PLS algorithms. Cross-validation is used to assess the predictive property of a PCA or a PLS model. The distribution function of the cross-validation test-statistic cvd-sd under the null-hypothesis is not well known. However, for PLS, the distribution of cvd-sd has been empirically determined by computer simulation technique [24] for some particular types of experimental designs. In particular, the discriminant analysis (or ANOVA-like) PLS analysis has been investigated in some detail as well as the situation with Y one-dimensional. This simulation study is referred to for detailed information. However, some tables of the critical values of cvd-sd at the 5 % level are given in Appendix C. [Pg.312]

One has to keep in mind that groups of objects found by any clustering procedure are not statistical samples from a certain distribution of data. Nevertheless the groups or clusters are sometimes analyzed for their distinctness using statistical methods, e.g. by multivariate analysis of variance and discriminant analysis, see Section 5.6. As a result one could then discuss only those clusters which are statistically different from others. [Pg.157]

The goal of classification, also known as discriminant analysis or supervised learning, is to obtain rules that describe the separation between known groups of observations. Moreover, it allows the classification of new observations into one of the groups. We denote the number of groups by Z and assume that we can describe our experiment in each population icj by a / -dimensional random variable Xj with distribution function (density) fj. We write pj for the membership probability, i.e., the probability for an observation to come from icj. [Pg.207]

This supervised classification method, which is the most used, accepts a normal multivariate distribution for the variables in each population ((Ai,..., A ) Xi) ), and calculates the classification functions minimising the possibility of incorrect classification of the observations of the training group (Bayesian type rule). If multivariate normality is accepted and equality of the k covariance matrices ((Ai,..., Xp) NCfti, X)), Linear Discriminant Analysis (LDA) calculates... [Pg.701]

The discriminant analysis techniques discussed above rely for their effective use on a priori knowledge of the underlying parent distribution function of the variates. In analytical chemistry, the assumption of multivariate normal distribution may not be valid. A wide variety of techniques for pattern recognition not requiring any assumption regarding the distribution of the data have been proposed and employed in analytical spectroscopy. These methods are referred to as non-parametric methods. Most of these schemes are based on attempts to estimate P(x g > and include histogram techniques, kernel estimates and expansion methods. One of the most common techniques is that of K-nearest neighbours. [Pg.138]

Least squares models, 39, 158 Linear combination, normalized, 65 Linear combination of variables, 64 Linear discriminant analysis, 134 Linear discriminant function, 132 Linear interpolation, 47 Linear regression, 156 Loadings, factor, 74 Lorentzian distribution, 14... [Pg.215]

Fisher suggested to transform the multivariate observations x to another coordinate system that enhances the separation of the samples belonging to each class tt [74]. Fisher s discriminant analysis (FDA) is optimal in terms of maximizing the separation among the set of classes. Suppose that there is a set of n = ni + U2 + + rig) m-dimensional (number of process variables) samples xi, , x belonging to classes tt, i = 1, , g. The total scatter of data points (St) consists of two types of scatter, within-class scatter Sw and hetween-class scatter Sb- The objective of the transformation proposed by Fisher is to maximize S while minimizing Sw Fisher s approach does not require that the populations have Normal distributions, but it implicitly assumes that the population covariance matrices are equal, because a pooled estimate of the common covariance matrix (S ) is used (Eq. 3.45). [Pg.53]

Because the deuterium distribution on the aromatic sites reserves the original state of the starting material, while the carbonyl deuterium is labile during the production, only the aromatic deuterimn abundance should be used as the primary data for the discrimination analysis of benzaldehyde products. [Pg.88]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...