Arrangement of data

One single number, called a scalar, is not appropriate for data analysis. [Pg.144]

Vectors A series of scalars can be arranged in a column or in a row. Then, they are called a column or a row vector. If the elements of a column vector can be attributed to special characteristics, e.g., to compounds, then data analysis can be completed. The chemical structures of compounds can be characterized with different numbers called descriptors, variables, predictors, or factors. For example, toxicity data were measured for a series of aromatic phenols. Their toxicity can be arranged in a column arbitrarily Each row corresponds to a phenolic compound. A lot of descriptors can be calculated for each compound (e.g., molecular mass, van der Waals volume, polarity parameters, quantum chemical descriptors, etc.). After building a multivariate model (generally one variable cannot encode the toxicity properly) we will be able to predict toxicity values for phenolic compounds for which no toxicity has been measured yet. The above approach is generally called searching quantitative structure - activity relationships or simply QSAR approach. [Pg.144]

Matrices Column vectors when put one after one form a matrix. Generally two kinds of matrices can be distinguished denoted by X and Y. The notation X is used for the matrix of independent variables. The notation Y is used for the matrix of dependent variables their values are to be predicted. If we can arrange our data into one (X) mahix, still we can unravel patterns in the data in an unsupervised way, i.e., we do not use the information of groupings present in the data. Such matrices are suitable for a principal component analysis (PCA). [Pg.144]

Matrices (arrays) can be multidimensional three-dimensional matrices are also called tensors. Analysis of tensors is frequently called 3-way analysis. Typical example is the data from a hyphenated technique, e.g., gas chromatography-mass spectrometry (GC-MS) data one direction (way) is the mass spectrum, second direction is the chromatographic separation (time, scan), and the third direction is the samples (of different origin, repetitions, calibration series, etc.). The 3-way analyses can easily be generalized into n-way analysis including more directions. [Pg.144]

3-Way analyses require routine use of matrix operations besides this, they can be unfolded into 2-way arrays (matrices). Therefore, we deal with analysis of matrices fiirther on. [Pg.145]

FIGURE 11.4 Two-way analysis of variance. Arrangement of data in rows and columns such that each row of the cell culture plate (shown at the top of the figure) defines a single dose-response curve to the agonist. Also, data is arranged by plate in that each plate defines eight dose-response curves and the total data set is comprised of 32 dose-response curves. The possible effect of location with respect to row on the plate and/or which plate (order of plate analysis) can be tested with the two-way analysis of variance. [Pg.233]

Fig. 8.25. Various topological arrangement of data which are arithmetically identical...

Table 2.7. Arrangement of data and calculations for a paired ftest...

Figure 1.18 Arrangement of data in columns, rows and layers to represent three-way ANOVA...

TABLE 9-4 Arrangement of Data of Chemical Mutagenic Potency in V79 Chinese Hamster Cells3... [Pg.217]

TABLE 9-5 Arrangement of Data of Chemica Potency in the Salmonella/microsome Test ... [Pg.219]

It is important to realize that many of these considerations are not only important for GPU programming. The arrangement of data in a data-parallel fashion, for example, is also important for parallel programming of distributed memory architectures, which are found in most of today s standard CPU clusters. Thus many of the techniques employed to improve the parallel efficiency of quantum chemistry codes are also applicable to GPUs. The same holds for the optimization of memory access patterns. A general... [Pg.23]

The concept of squared distances has important functional consequences on how the value of the correlation coefficient reacts to various specific arrangements of data. The significance of correlation is based on the assumption that the distribution of the residual values (i.e., the deviations from the regression line) for the dependent variable y follows the normal distribution and that the variability of the residual values is the same for all values of the independent variable. However, Monte Carlo studies have shown that meeting these assumptions closely is not crucial if the sample size is very large. Serious biases are unlikely if the sample size is 50 or more normality can be assumed if the sample size exceeds 100. [Pg.86]

Arrangement of data acquisition system of injecting liquid CO ... [Pg.34]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...